AITopics | grammatical error

Collaborating Authors

grammatical error

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

LASER: An LLM-based ASR Scoring and Evaluation Rubric

Parulekar, Amruta, Jyothi, Preethi

arXiv.org Artificial IntelligenceOct-10-2025

Standard ASR evaluation metrics like Word Error Rate (WER) tend to unfairly penalize morphological and syntactic nuances that do not significantly alter sentence semantics. We introduce an LLM-based scoring rubric LASER that leverages state-of-the-art LLMs' in-context learning abilities to learn from prompts with detailed examples. Hindi LASER scores using Gemini 2.5 Pro achieved a very high correlation score of 94% with human annotations. Hindi examples in the prompt were also effective in analyzing errors in other Indian languages such as Marathi, Kannada and Malayalam. We also demonstrate how a smaller LLM like Llama 3 can be finetuned on word-pair examples derived from reference and ASR predictions to predict what kind of penalty should be applied with close to 89% accuracy.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2510.07437

Country: Asia > India (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Large Language Model-Driven Dynamic Assessment of Grammatical Accuracy in English Language Learner Writing

Jaganov, Timur, Blake, John, Villegas, Julián, Carr, Nicholas

arXiv.org Artificial IntelligenceSep-8-2025

This study investigates the potential for Large Language Models (LLMs) to scale-up Dynamic Assessment (DA). To facilitate such an investigation, we first developed DynaWrite-a modular, microservices-based grammatical tutoring application which supports multiple LLMs to generate dynamic feedback to learners of English. Initial testing of 21 LLMs, revealed GPT-4o and neural chat to have the most potential to scale-up DA in the language learning classroom. Further testing of these two candidates found both models performed similarly in their ability to accurately identify grammatical errors in user sentences. However, GPT-4o consistently outperformed neural chat in the quality of its DA by generating clear, consistent, and progressively explicit hints. Real-time responsiveness and system stability were also confirmed through detailed performance testing, with GPT-4o exhibiting sufficient speed and stability. This study shows that LLMs can be used to scale-up dynamic assessment and thus enable dynamic assessment to be delivered to larger groups than possible in traditional teacher-learner settings.

large language model, learner, machine learning, (20 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/ACCESS.2025.3603191

2505.00931

Country: Asia (0.46)

Genre:

Research Report > New Finding (1.00)
Instructional Material (1.00)

Industry:

Education > Curriculum > Subject-Specific Education (0.49)
Education > Assessment & Standards > Student Performance (0.46)
Education > Educational Setting > Higher Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Imitating Mistakes in a Learning Companion AI Agent for Online Peer Learning

Moribe, Sosui, Ushiama, Taketoshi

arXiv.org Artificial IntelligenceJul-18-2025

In recent years, peer learning has gained attention as a method that promotes spontaneous thinking among learners, and its effectiveness has been confirmed by numerous studies. This study aims to develop an AI Agent as a learning companion that enables peer learning anytime and anywhere. However, peer learning between humans has various limitations, and it is not always effective. Effective peer learning requires companions at the same proficiency levels. In this study, we assume that a learner's peers with the same proficiency level as the learner make the same mistakes as the learner does and focus on English composition as a specific example to validate this approach.

large language model, learner, machine learning, (16 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/IMCOM64595.2025.10857528

2507.12801

Country:

North America > United States (0.68)
Asia > Japan > Kyūshū & Okinawa > Kyūshū (0.14)

Genre:

Research Report > New Finding (1.00)
Instructional Material > Online (0.93)

Industry:

Education > Educational Setting > Online (1.00)
Education > Educational Technology > Educational Software > Computer Based Training (0.68)
Education > Educational Setting > K-12 Education > Secondary School (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.95)
Information Technology > Enterprise Applications > Human Resources > Learning Management (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

A Representation Level Analysis of NMT Model Robustness to Grammatical Errors

Issam, Abderrahmane, Semerci, Yusuf Can, Scholtes, Jan, Spanakis, Gerasimos

arXiv.org Artificial IntelligenceMay-28-2025

Understanding robustness is essential for building reliable NLP systems. Unfortunately, in the context of machine translation, previous work mainly focused on documenting robustness failures or improving robustness. In contrast, we study robustness from a model representation perspective by looking at internal model representations of ungrammatical inputs and how they evolve through model layers. For this purpose, we perform Grammatical Error Detection (GED) probing and representational similarity analysis. Our findings indicate that the encoder first detects the grammatical error, then corrects it by moving its representation toward the correct form. To understand what contributes to this process, we turn to the attention mechanism where we identify what we term Robustness Heads. We find that Robustness Heads attend to interpretable linguistic units when responding to grammatical errors, and that when we fine-tune models for robustness, they tend to rely more on Robustness Heads for updating the ungrammatical word representation.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2505.21224

Country:

Europe (1.00)
Asia (1.00)
North America > United States > Minnesota (0.28)

Genre: Research Report > New Finding (0.66)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Machine Translation (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Towards Smarter Hiring: Are Zero-Shot and Few-Shot Pre-trained LLMs Ready for HR Spoken Interview Transcript Analysis?

Maity, Subhankar, Deroy, Aniket, Sarkar, Sudeshna

arXiv.org Artificial IntelligenceApr-9-2025

This research paper presents a comprehensive analysis of the performance of prominent pre-trained large language models (LLMs), including GPT-4 Turbo, GPT-3.5 Turbo, text-davinci-003, text-babbage-001, text-curie-001, text-ada-001, llama-2-7b-chat, llama-2-13b-chat, and llama-2-70b-chat, in comparison to expert human evaluators in providing scores, identifying errors, and offering feedback and improvement suggestions to candidates during mock HR (Human Resources) interviews. We introduce a dataset called HURIT (Human Resource Interview Transcripts), which comprises 3,890 HR interview transcripts sourced from real-world HR interview scenarios. Our findings reveal that pre-trained LLMs, particularly GPT-4 Turbo and GPT-3.5 Turbo, exhibit commendable performance and are capable of producing evaluations comparable to those of expert human evaluators. Although these LLMs demonstrate proficiency in providing scores comparable to human experts in terms of human evaluation metrics, they frequently fail to identify errors and offer specific actionable advice for candidate performance improvement in HR interviews. Our research suggests that the current state-of-the-art pre-trained LLMs are not fully conducive for automatic deployment in an HR interview assessment. Instead, our findings advocate for a human-in-the-loop approach, to incorporate manual checks for inconsistencies and provisions for improving feedback quality as a more suitable strategy.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.05683

Country: Asia > India (0.46)

Genre:

Research Report > New Finding (1.00)
Personal > Interview (1.00)

Industry: Education > Educational Setting (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Revisiting Classification Taxonomy for Grammatical Errors

Zou, Deqing, Ye, Jingheng, Liu, Yulu, Wu, Yu, Xu, Zishan, Li, Yinghui, Zheng, Hai-Tao, An, Bingxu, Wei, Zhao, Xu, Yong

arXiv.org Artificial IntelligenceFeb-17-2025

Grammatical error classification plays a crucial role in language learning systems, but existing classification taxonomies often lack rigorous validation, leading to inconsistencies and unreliable feedback. In this paper, we revisit previous classification taxonomies for grammatical errors by introducing a systematic and qualitative evaluation framework. Our approach examines four aspects of a taxonomy, i.e., exclusivity, coverage, balance, and usability. Then, we construct a high-quality grammatical error classification dataset annotated with multiple classification taxonomies and evaluate them grounding on our proposed evaluation framework. Our experiments reveal the drawbacks of existing taxonomies. Our contributions aim to improve the precision and effectiveness of error analysis, providing more understandable and actionable feedback for language learners.

artificial intelligence, natural language, revisiting classification taxonomy, (1 more...)

arXiv.org Artificial Intelligence

2502.1189

Genre: Research Report (0.69)

Technology: Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)

Add feedback

Efficient Standardization of Clinical Notes using Large Language Models

Hier, Daniel B., Carrithers, Michael D., Do, Thanh Son, Obafemi-Ajayi, Tayo

arXiv.org Artificial IntelligenceDec-31-2024

Clinician notes are a rich source of patient information but often contain inconsistencies due to varied writing styles, colloquialisms, abbreviations, medical jargon, grammatical errors, and non-standard formatting. These inconsistencies hinder the extraction of meaningful data from electronic health records (EHRs), posing challenges for quality improvement, population health, precision medicine, decision support, and research. We present a large language model approach to standardizing a corpus of 1,618 clinical notes. Standardization corrected an average of $4.9 +/- 1.8$ grammatical errors, $3.3 +/- 5.2$ spelling errors, converted $3.1 +/- 3.0$ non-standard terms to standard terminology, and expanded $15.8 +/- 9.1$ abbreviations and acronyms per note. Additionally, notes were re-organized into canonical sections with standardized headings. This process prepared notes for key concept extraction, mapping to medical ontologies, and conversion to interoperable data formats such as FHIR. Expert review of randomly sampled notes found no significant data loss after standardization. This proof-of-concept study demonstrates that standardization of clinical notes can improve their readability, consistency, and usability, while also facilitating their conversion into interoperable data formats.

abbreviation, large language model, natural language, (17 more...)

arXiv.org Artificial Intelligence

2501.00644

Country:

North America > United States > Missouri > Greene County > Springfield (0.04)
North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Kentucky (0.04)

Genre:

Research Report > Experimental Study (1.00)
Research Report > Strength High (0.68)

Industry:

Health & Medicine > Health Care Technology > Medical Record (1.00)
Health & Medicine > Therapeutic Area > Neurology > Multiple Sclerosis (0.47)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.77)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.69)

Add feedback

Tibyan Corpus: Balanced and Comprehensive Error Coverage Corpus Using ChatGPT for Arabic Grammatical Error Correction

Alrehili, Ahlam, Alhothali, Areej

arXiv.org Artificial IntelligenceNov-7-2024

Natural language processing (NLP) utilizes text data augmentation to overcome sample size constraints. Increasing the sample size is a natural and widely used strategy for alleviating these challenges. In this study, we chose Arabic to increase the sample size and correct grammatical errors. Arabic is considered one of the languages with limited resources for grammatical error correction (GEC). Furthermore, QALB-14 and QALB-15 are the only datasets used in most Arabic grammatical error correction research, with approximately 20,500 parallel examples, which is considered low compared with other languages. Therefore, this study aims to develop an Arabic corpus called "Tibyan" for grammatical error correction using ChatGPT. ChatGPT is used as a data augmenter tool based on a pair of Arabic sentences containing grammatical errors matched with a sentence free of errors extracted from Arabic books, called guide sentences. Multiple steps were involved in establishing our corpus, including the collection and pre-processing of a pair of Arabic texts from various sources, such as books and open-access corpora. We then used ChatGPT to generate a parallel corpus based on the text collected previously, as a guide for generating sentences with multiple types of errors. By engaging linguistic experts to review and validate the automatically generated sentences, we ensured that they were correct and error-free. The corpus was validated and refined iteratively based on feedback provided by linguistic experts to improve its accuracy. Finally, we used the Arabic Error Type Annotation tool (ARETA) to analyze the types of errors in the Tibyan corpus. Our corpus contained 49 of errors, including seven types: orthography, morphology, syntax, semantics, punctuation, merge, and split. The Tibyan corpus contains approximately 600 K tokens.

chatgpt, corpus, correction, (14 more...)

arXiv.org Artificial Intelligence

2411.04588

Country:

North America > United States > Arizona (0.04)
Asia > Middle East > Qatar (0.04)
North America > United States > Minnesota (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Grammars & Parsing (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Grammatical Error Feedback: An Implicit Evaluation Approach

Bannò, Stefano, Knill, Kate, Gales, Mark J. F.

arXiv.org Artificial IntelligenceAug-18-2024

Grammatical feedback is crucial for consolidating second language (L2) learning. Most research in computer-assisted language learning has focused on feedback through grammatical error correction (GEC) systems, rather than examining more holistic feedback that may be more useful for learners. This holistic feedback will be referred to as grammatical error feedback (GEF). In this paper, we present a novel implicit evaluation approach to GEF that eliminates the need for manual feedback annotations. Our method adopts a grammatical lineup approach where the task is to pair feedback and essay representations from a set of possible alternatives. This matching process can be performed by appropriately prompting a large language model (LLM). An important aspect of this process, explored here, is the form of the lineup, i.e., the selection of foils. This paper exploits this framework to examine the quality and need for GEC to generate feedback, as well as the system used to generate feedback, using essays from the Cambridge Learner Corpus.

computational linguistic, information, lexical information, (15 more...)

arXiv.org Artificial Intelligence

2408.09565

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Czechia > Prague (0.05)
North America > Mexico > Mexico City > Mexico City (0.04)
(8 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.54)

Add feedback

Characterizing and Evaluating the Reliability of LLMs against Jailbreak Attacks

Chen, Kexin, Liu, Yi, Wang, Dongxia, Chen, Jiaying, Wang, Wenhai

arXiv.org Artificial IntelligenceAug-17-2024

Large Language Models (LLMs) have increasingly become pivotal in content generation with notable societal impact. These models hold the potential to generate content that could be deemed harmful.Efforts to mitigate this risk include implementing safeguards to ensure LLMs adhere to social ethics.However, despite such measures, the phenomenon of "jailbreaking" -- where carefully crafted prompts elicit harmful responses from models -- persists as a significant challenge. Recognizing the continuous threat posed by jailbreaking tactics and their repercussions for the trustworthy use of LLMs, a rigorous assessment of the models' robustness against such attacks is essential. This study introduces an comprehensive evaluation framework and conducts an large-scale empirical experiment to address this need. We concentrate on 10 cutting-edge jailbreak strategies across three categories, 1525 questions from 61 specific harmful categories, and 13 popular LLMs. We adopt multi-dimensional metrics such as Attack Success Rate (ASR), Toxicity Score, Fluency, Token Length, and Grammatical Errors to thoroughly assess the LLMs' outputs under jailbreak. By normalizing and aggregating these metrics, we present a detailed reliability score for different LLMs, coupled with strategic recommendations to reduce their susceptibility to such vulnerabilities. Additionally, we explore the relationships among the models, attack strategies, and types of harmful content, as well as the correlations between the evaluation metrics, which proves the validity of our multifaceted evaluation framework. Our extensive experimental results demonstrate a lack of resilience among all tested LLMs against certain strategies, and highlight the need to concentrate on the reliability facets of LLMs. We believe our study can provide valuable insights into enhancing the security evaluation of LLMs against jailbreak within the domain.

jailbreak attack, language model, llm, (12 more...)

arXiv.org Artificial Intelligence

2408.09326

Country:

Asia > China > Zhejiang Province > Hangzhou (0.04)
North America > United States (0.04)
North America > Canada (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.68)

Add feedback